News

AI: I ain’t got no body!

17 Jul 2025

Computer linguist Philipp Wicke explores how artificial intelligence can be taught a human understanding of the world, and how it can even understand non-verbal communication.

The stupid thing about artificial intelligence is that it is not intelligent. For any kind of intelligence, that is a hard saying to understand. Either the sentence is tautologous – if machine intelligence is not intelligent, it is stupid – or it sounds like an oxymoron, a contradiction in terms: If it is stupid, why do we refer to it as intelligence? What worries people who have dealings with AI is that both statements are correct: Artificial intelligence is at once glaringly stupid and breathtakingly intelligent.

Large language models (LLMs) are autocomplete experts that are trained to predict the probability of what the next word will be. To do this, the new generation, including tools such as ChatGPT, ‘sees’ or anticipates texts that have grown ever longer. For example: “I understand”; “I understand what”; “I understand what AI”, “I understand what AI is”. Earlier models were fed with texts that still contained gaps so that they could learn to recognize the missing word in a sentence and review their own performance. To fill in the gaps, they were able to automatically draw on virtually all texts published on the Internet.

Philipp Wicke stands in a lecture hall

Researching the blind spots of artificial intelligence:

AI expert Philipp Wicke.

© LMU/Stephan Höck

AI does not grasp the meaning of its own statements

Self-monitored learning is fundamentally different to the way in which humans assimilate knowledge. It is also far more inefficient in terms of the materials and energy consumed. Be that as it may, LLMs such as ChatGPT, Perplexity, Gemini, DeepSeek and xAI certainly deliver astonishing results: AI can analyze terabytes of data in a fraction of a second, which would be inconceivable for humans. What AI doesn’t have, however, is the ability to understand context beyond the confines of its training data. It does not grasp the meaning of the texts which it autocompletes.

That is why users never really know what they are dealing with: an exceptionally proficient advisor or a dumb data collector that neither accumulates knowledge for itself nor learns from experience? AI is clever, but it doesn’t get cleverer. It expresses itself like a human, but it isn’t a human. The resultant uncertainty often leads users into a field of perception that has been termed the ‘uncanny valley’.

Studying AI’s ‘blind spots’

This is precisely where Dr. Philipp Wicke's research work comes in. He is, if you like, a professional scout who can guide you through the ‘uncanny valley’ of AI.

Wicke is a cognitive scientist. His trajectory to date has taken him from Osnabrück, where he attended courses in mathematics, informatics, psychology, philosophy, artificial intelligence and computational linguistics, to University College in Dublin, where he investigated computer creativity and earned his doctorate in 2021 with a thesis on story-telling robots. Papers on embodied cognitionsuch as non-verbal communication and its integration in AI systems are core aspects of his research.

News

Polyglot maschines: More on the research Prof. Hinrich Schütze

Read more

Today, Wicke works under Professor Hinrich Schütze, Chair of Computational Linguistics and Co-Director of the Center for Information and Language Processing (CIS) at LMU Munich.

He has conducted research into the blind spots of artificial intelligence, and he is seriously committed to making AI (more) intelligent. Right now, he serves as Junior Researcher in Residence with a research grant at LMU’s Center for Advanced Studies (CAS). At the start of September 2025, he will lead a conference on “Non-Verbal Behavior and Embodiment in Human-AI Communication” – a conference that he himself organized.

What makes humans’ understanding of the world unique

Wicke’s approach is based on the observation that the way humans understand the world depends largely on the body we are in, the environment we are in and how this environment influences our perception. That said, he argues that the rapid development of artificial intelligence in general and large language models (LLMs) in particular has almost entirely neglected the fact that AI is completely devoid of this spatial-physical relationship with the world and, hence, of an essential element of our communication. It is this relationship that has spawned the metaphors we use in our languages, for example. Artificial intelligence usually fails to understand newly created metaphors because it – quite literally – cannot grasp them.

People understand metaphors such as “her words hit him like a hammer blow” because they know physically what a hammer blow feels like. But AI models do not have a body. Investigating the relevant LLMs in response to this realization, Wicke was able to prove that statistical learning from textual data is indeed sufficient to simulate certain aspects of human cognition. As a result, the language models can also “understand” conventionalized metaphors (such as an “icy look”) because they occur with considerable frequency in the training data. However, they still have trouble understanding creative neologisms (“time devoured her memories”) and metaphors that are tied to a given culture (“as sweet as baklava”). Even so, larger models (such as GPT-3) manage to interpret conventionalized metaphors with embodied verbs (to run, to fall) better than abstract terms (to think, to exist). Wicke’s work shows that the more “embodied” a verb is (such as to dance or to grab), the easier it is for AI systems to correctly interpret the associated metaphors – provided the models are large enough. One explanation for this is that more embodied concepts are also more lexicalized by people in the language they use. A broader and deeper understanding of these concepts therefore exists, leading to a solid construct that the language model can recreate. In contrast, smaller models (such as GPT-2) often fall at this hurdle. A positive effect can be seen only upward of around a billion parameters in the LLM.

The purely text-based training of these models nevertheless still neglects the reality that human communication is multimodal: It is not based on words alone, but also depends on gestures, facial expressions and body language in order to convey meaning, intent and emotions. Having no physical body, LLMs lack the understanding that is needed for these non-verbal signals.

Training AI to help it communicate more naturally

Workshop: On Non-Verbal Behavior and Embodiment in Human-AI Communication

Read more

At the Center for Advanced Studies, Wicke is therefore now concentrating on those aspects other than language itself that underpin communication and convey information. He is interested in “how things are said”. That includes body language, movements, gestures, hand movements and signals, facial expressions, eye movements and paraverbal signals such as tone of voice, volume, speech rate and pauses.

This “how” aspect is based on our experience of the world’s physical resonance, so there is always an element of spatial reference. If AI systems are to better recognize gestures and spatial metaphors, they need to find descriptions of bodily experiences in their training data in order to incorporate them. Metaphoric turns of phrase such as “time flies”, for example, would then be just as readily understood as specific gestures.

“You have to explicitly get the AI models in shape for this kind of non-verbal communication to enable more intuitive, more natural human-AI interaction,” Philipp Wicke says.

AI has no intercultural understanding

News

Moderating hate speech: AI models need to be people-centric

Read more

There is something else that LLMs still have to learn: They often do not know what cultural codes and contexts are. That is important, because metaphors and non-verbal signals are not only context-sensitive but also shot through with cultural significance. Yet although LLMs have been trained using gigantic volumes of text, they often lack an understanding of how metaphors, gestures and paraverbal signals are tied to specific cultures – in particular because they have mostly been trained using texts from what are known as the WEIRD (Western, Educated, Industrialized, Rich and Democratic) societies. On the other hand, this problem is by no means the exclusive preserve of LLMs alone.

What is even more irritating, however, is when LLMs interpret metaphors incorrectly or display facial expressions or gestures that do not fit the cultural context in which they are being used at a given time. The bodies of text with which the LLMs were trained emanate mainly from the Western/North American context, which are quite simply the ones with the largest volumes. But for precisely this reason, other cultures are underrepresented in the training data.

It follows that making the ‘uncanny valley’ less uncanny is important on two counts when developing AI applications, robots, virtual assistants and computer-generated characters. Why? Because it has such a powerful influence on both user acceptance and trust in these technologies.

Why AI threatens to become more stupid

At the end of our interview, the fact that AI does not communicate “authentically” brings another thought to Philipp Wicke’s mind: “If LLMs draw their training data from the bodies of text available on the Internet, and if this material is increasingly inundated by those texts that LLMs themselves have created, then AI will soon be training primarily with its own concoctions – i.e. with all the hallucinations, half-truths and unverified assertions that we encounter today in our dealings with AI. In the long run, this will inevitably erode the LLMs’ ‘intelligence’.”

Incidentally, the first sentence of this article was fed into AI with a request to check it for an expert appraisal. The sentence was this: “The stupid thing about artificial intelligence is that it is not intelligent.” And AI’s response was this: “Witty and critical this statement may be, but it is not quite correct because AI is very powerful in many areas, even if it is not sentient.” When confronted with the argument that non-sentient capabilities are not intelligence, the AI defended itself with the following words: “Non-sentient capabilities are not the opposite of intelligence, but rather a sophisticated expression thereof.” Well, there you are!

What are you looking for?